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ABSTRACT 

The power of the classical Linear Discriminant 
Function (LDF) is compared, using Honte Carlo techniques with five 
other procedures for classifying observations from certain non-normal 
distributions. The alternative procedures considered are the 
Quadratic Discriminant Function, a Nearest Neighbor Procedure with 
Probability Blocks, and three density estimators. Comparisons of 
misclassifications are examined for varying sample sizes for two and 
three dimensional models. Three types of distributions are 
considered: finite range (Logit Normal), semi^mf inite range (Joj 
Normal), and infinite range (Inverse Hyperbolic S me ^Normal) .. Results 
indicat4 that certain alternatives to tbe LDF classify observations 
correctly in a greater proportion than does the LDF for non-normal 
data, and that different procedures are best for different types of 
non-normality. (Author) 
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Introduction 



The problem of discrimination (or classification) has always 
been one of major concern to the behavioral scientist^ and one for 
which there has not always been a satisfactory solution^ The dis- 
crimination problem arises when the researcher must rationally assign 
an individual or object to one of a finite number of populations 
on the basis of a series of measurements obtained on that individual 
or object, as well as any other pertinent information available. 

Fisher (1936) presented the first clear solution to the classi-- 
f ication problem. Fisher' s solution, called the Linear Discriminant 
Function (LDF) , was the linear combination of the measurements which 
maximized the ratio of the difference between the saiiqc);le mean^ 
the standard deviation within samples. The inception of a theoretical 
solution to the problem emerged when the hypothesis testing concepts 
of Neyman and Pearson were adapted to the discrimination problem by 
Welch (1939). Welch noted that discrimination procedures classifying 
new p-dimensional observations were equivalent to partitionings of ; f 
the sample space into mutually exclusive and exhaustive region^ R^^. - 

For a specific partitioning of the sample space/ -thei observa-. 
tion Z to be classified is assigned to population^ if the p-dimeri- ^ 
sional point lies in region R^. Using this rationale^ there are , 
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two possible types of errors that can be committed for the two 

population problem when classifying the observations. 

The observation can be classified as originating 
from population S^^ when it actually comes from 

2. The observation can be classified as originating 
from population S2 when it actually comes from S^^. 

Associated with each of these errors is a probability of committing 

the error (called the probabilities of misclassif ication) . Welch 

(1939) proposed that the optimum classification procedure be that 

procedure which partitions the sample space in such a manner that 

the corresponding probabilities of misclassif ication are minimized. 

Welch showed that the optimal partitioning is effected by form- 
ing the ratio of the densities of the two populations , f^(x)/f2(x). 
The observation to be classified is assigned to population if the 
value of the likelihood ratio is greater than some appropriately 
determined constant and the observation is assigned to population 
S2 if the value of the likelihood ratio is less than k. 

Anderson (1951) has shown that the discrimination problem can 
be thought of as a problem of "statistical idecision functions": 
There are a finite number of hypotheses , each hypothesis stating 
that the distribution of the observation is a specified one; one of 
the hypothesis is not rejected, the remainder are. Anderson (1951) 
has shown that a good classification procedure is one which minimizes 
the "cost" (loss function) of misclassif ication associated with the 
procedure. The decision theoretic objective is one of determining 
an appropriate classification rule that will minimize the risk 
(expected loss) associ^^ted with the procedure. 

The procedure that minimizes the risk function for given a priori 
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probabilities (a priori probability that the obseinration to be 
classified belongs to population S^) is a Bayes procedure. When 
the a priori probabilities are not knov/n. Von Mises (1945) has shown 
that the procedure which allows the maximum of the minimum proba- 
bility of correct classification (the minimax procedure) effects the 
best partitioning of the sample space. 

In either situation (whether the a priori probabilities are 
known or unknown) , the form of the best classification procedure is 
the ratio of the density functions of the populations. When the 
a priori probabilities 'are equal and the dosts of misclassif ication * 
are equal , the best procedure is such that an observation Z is classi- 
fied as belonging to S^^ if fj^(z)/f2(z) > 1 and Z is classified into 

if fj^(z)/f2(z) < 1. When the likelihood ratio is identical to one^ 
a randomized procedure is used to classify the observations. 

Frequently behavioral scientists collect data that is represen- 
tative of multivariate noarmal distributions. For the two population 
situation y when the populations are multivariate normal with a common 
variance-covariance matrix and knovm mean vectors , the optimal classi- 
fication procedure (the ratio of the densities) simplifies to: 
L = Z'Z~^(y^-iJ2) - V2(y;L"^y2^ '5""^^Hl"H2^ 
The first term of (1) is that to which Fisher's LDF reduces if 
the population means and common variance-covariahce matrix are known. 
Anderson (1951) has shown that the optimal probability of misclassi- 
fication associated with (1) assuming equal costs of misclassif ication 

and equal a priori probabilities, is $(-A/2), where 
2 -1 

L = ^Hl"'^2^'^ ^^l'*^2^ *(•) is the ordinate of the normal dis- 

tribution function. 



When the variance-covariance matrices of €he two multivariate 
normal populations are not identical ^ the form of the likelihood 
ratio is not a linear function^ but instead a quadratic function^ 
called the Quadratic Discriminant Function. 

QDF = V2Z' (E^^-Z2''')Z + (v^2h^ " Hl?!"""^? + ^2) 

It is not unusual that researchers must consider situations in 
which the distributions from which their data are drawn are not 
completely specified^ but instead are known except for one or more 
parameters. When these circumstances arise, the unknown parameters 
of the distributions must be estimated from samples* Then classifi- 
cation procedures are developed which are based on the sample 
estimates. 

To determine appropriate sample based classification procedures 
it is appropriate to select those procedures whose risk functions 
asymptotically approach the risk function of the optimal procedure^ 
(i.e., those sample based procedures which are consistent). Hoel 
and Peterson (1949) intuitively reasoned that the best sample based 
procedure would be of the type of the likelihood ratio in which the 
sample estimates replaced the unknown parameters (called a "plug-in^" 
procedure). Fix and Hodges (1951) showed that the "plug-in" procedure 
of Hoel and Peterson was a consistent procedure and was the most 
appropriate sample based technique to use. 

For the case in which the parameters of the multivariate normal 
distribution are unknown^ Anderson (1951) developed a statistic (W) 
which is the "plug-in" analogue to the LDF. The W statistic is of 



the form of (1) in which the maximum likelihood estimates replace 
the unknown parameters • The first term of Anderson's W statistic 
is the form of the LDP first obtained by Fisher (1936). Using 
similar arguments^ the form of the QDF when based on sample estimates 
is identical to (2) with X. replacing y. and S. replacing 2^. 

Historically^ the Linear Discriminant Function (or the Anderson 
W statistic) has been used almost exclusively for discrimination 
problems regardless of whether the assumption of multivariate 
normality of the underlying populations has been satisfied. However^ 
Lachenbruch^ Sneering^ and Revo (1973) have shown that the LDF is 
clearly not a robust procedure when used to classify observations 
from non-normal distributions. Because of this it is inappropriate 
to use the LDF with data that is not representative of a multi- 
variate normal distribution. When the samples are drawn from popula- 
tions that are of some known distribution, the optimal procedure is 
the ratio of the densities. In most situations r however, infomation 
is obtained from samples drawn from unknown populations and alter- 
native distribution-free classification methods should be employed. 

The most desirable type of nonparametric classification pro- 
cedure is a procedure which is consistent with the likelihood ratio 
procedure. Fix and Hodges (1951) considered a solution to the non- 
parametric discrimination problem based on estimates of the unknown 
densities, and used these estimates as "plug-in" versions of the 
likelihood ratio procedure. Alternative types of nonparametric 
classification procedures suggested include Nearest Neighbor type 
procedures and certain methods based nonparametric rank tests. 

It was the purpose of the research described in this paper to 
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empirically contrast the discriminatory power of alternative two 
population classification procedures to the classical LDF when 
classifying data that originates from certain types of non-normal 
distributions • 

Model and Methodology 
Let and be two absolutely continuous p-dimensional random 
varicibles; their probability density function given by fj^(x) and 
f 2 (x) r respectively. Using Monte Carlo techniques, samples from 
four types of p-dimensional distributions, all of whose dimensions 
were independent, were generated for the two population discrimina- 
tion problem. The four distributions included the multivariate 
normal distribution and non-normal representatives from three classes 
of distribution: 1) distributions with finite range; 2) distri- 
butions with semi-infinite range; and 3) distributions with infinite 
rcinge. 

The three non-normal distributions were generated from the 
Johnson (1949) system of distributions. The distributions were the 
Log Normal distribution, the Logit Normal distribution, and the 
Inverse Hyperbolic Sine Normal distribution. To obtain the required 
non-normal samples, normally distributed random variables were 
generated and the appropriate inverse transformation performed. The 
Johnson system of trauisformations and inverse transformations is 
summarized in Tcible 1. 

In Table 1, the variable y is normally distributed with given 
mean and variance; the variable x is distributed according to the 
appropriate non-normal distribution. To obtain random points from 
a normal distribution, uniform random deviates were generated from 



the IBM Scientific Subroutine Package. Then using the Central Limit 
Theorem^ the normally distributed random data points were determined^ 

TABLE 1 

TRANSFORMATIONS (AJ;D TKEIR INVERSES) 
THAT GENERATE THE JOHNSON (1949) 
SYSTEM OF DISTRIBUTIONS 



Distribution 



Transformation 



Log Normal 

Log it Normal 

Inverse Hyperbolic 
Sine Normal 



y=log X 0<x<<» 
y=log (x/l-x) 0<x<l 
y=Sinh""'' (x) -<»<x«» 



Inverse 



x=EXP (y) 
x=EXP (y/l-y) 
x=Sinh(y) 



The p-dimensional normal distributions which were used to generate 
the non-normal distributions had the identity matrix as their 
variance- CO variance matrix. The mean vector for population S^^ was 
(u,0, . . . ^ 0) and for population (0^ ... ^ 0). For each 

of the four distributions^ samples were generated for each combi- 
nation of sample size (n = 64, 200, 729), first component of the 
mean vector for population S^^ (y = 1,2,3), and dimensionality 
(p ^ 2,3) . 

For each of the eighteen possible combinations of the parameters 
for each of the four distributions, six different classification 
rules were developed. The classification procedures considered were: 

1. Linear Discriminant Function (Anderson W Statistic) 

2. Quadratic Discriminant Function 



There has been in the past some criticism leveled as to the 
validity of IBM's Scientific Subroutine Package random number generator, 
Using Chi-Square tests described by MacLaren and Marsaglia (1965), it 
was determined that the uniform random deviates generated from the 
program were random. Further , using the Kolmogorov-Smirnov test, it 
was determined that the random variables obtained from the uniform 
random deviates were representatives of a normal distribution. 
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3. Nearest Neighbor with Probability Blocks 

4. Parzen-Cacoullos Density Estimator 

5. Loftsgaarden-Quesenberry Density Estimator 

6. Gessaman Density Estimator 

Linear Discriminant Function 

The Anderson W Statistic was used, assuming equal a priori 
probabilities and equal costs of misclassif ication. For this situa- 
tion, the Anderson W statistic is of the form: 

W = Z'§"^^Bl"?2^ " V2CXi+X2)'s'^(Xj^-X2) O) 
The observations are classified as belonging to S^^ if W > 0; as 
belonging to if W < 0. 
Quadratic Discriminant Function 

Table 2 illustrates the means and variances of the three non- 
normal distributions. Clearly, the variances for are markedly 
different from that for Sj. Therefore, it was appropriate to con- 
sider clas-sif ication according to the Quadratic Discriminant Functioii 
The form of the QDF, assuming equal a priori probabilities and equal 
costs of misclassif ication is: 

QDF = 1/2Z' (Si^-S;^)Z + (XjS;^ - XiSi^)Z + 

V2([XiS~^X3^ - XjSj^Xj] - iogIS2/Si]) 

The new observations are classified into S^^ if Q > 0; into 
if Q < 0. _ - 

Nearest Neighbor with Probab ility Blocks 

The Nearest Neighbor with Probability Blocks procedure is based 
on distribution-free tolerance regions.. To obtain the necessary : 
probability blocks, a procedure outlined by Gessamarf a^^ Gessaman^^, : . 
(1972) was employed. Assume, without loss of generality that 
p = 2— the generalization to general ^ space is immediate^ Let -^,|C; 



TABLE 2 



MEANS AND VARIANCES OP THE NON-NORMAL DISTRIBUTIONS 
FOR SPECIFIED MEANS OF THE NORMAL DISTRIBUTION 





jjog Normal 


•> 


y 




. . X ... 


0 


1 65 




1 


4.48 


34.51 


2 


12.18 


255.02 


3 


33.12 


1884.32 ~ 




Logit Normal 




y 






0 


.50 


.043 


1 


.70 


.029 


2 


.84 


.015 


3 


.94 


.001 




Inverse Hyperbolic Sine 


Normal 






< 


^y 






0 


0 


3.19 


1 


1.94 


9.65 


2 


5.98 


74.63 


3 


16.52 


471.94 



SOURCE: Lachenbruch, Sneeringer and Revo. Robustness 
of the Linear and Quadratic Discriminant Function to Certain 
Types of Non-Normality. Communications in Statistics ^ 
1973, 1, 54. 



o is the variance of the underlying normal distri- 
bution.^ 

is the mean of the underlying normal distribution. | 
j]^ is the mean of the transformed non-normal variate.- 
2 

o is the variance of the transformed non-normal variate. 
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k « k = in^P"''"^/^^**""'"^] , the greatest integer less than or equal to 
^(p-l/p+1)^ 

The observations were ranked along the first coordinate and 
the plane partitioned into I(n/k)^/^J "blocks" by making [(n/W^/^l -1 
evenly spaced "cut^'on the ranked observations. The cut-point 
belongs to the right boundary of the block of which it forms. Since 
the distributions are absolutely continuous, ties occur with proba- 
bility zero. 

The observations used to make the cuts were deleted. Then 
taking each block, the remaining observations were partitioned into 
[(n/k)^/2] subblocks by making Kn/k)^/^] - 1 evenly spaced cuts 
on the second coordinate. The plane was then partitioned into 
[(n/k)-"-/^] probability blocks, each containing k-1, k, or k+1 
observations. 

Once the probability blocks were determined, the observations 
in X, and X„ used to develop them were classified into the blocks. 
A block was considered to be an X^^ block if the majority of the 
observations in the block were X^^ observations; an X2 block if the 
majority of the observations in the block were X2 observations. If 
a block had an equal number of observations from both populations, 
the block was classified according to its neighboring blocks. 

Once the membership of each of the blocks was ascertained, the 
new observations were classified and the number of misclassif ications 
detemined. 
Density Estimators 

For f (x) and f , (x) consistent estimators of fj_(x) and- f 2 (x) , 
respectively, the procedure used was the ratio of the density esti- ; 
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mators. The new observations are classified into Sj^ if the ratio is 
greater than one; into S2 if the ratio is less than one. 
Parzen-Cacoullos Density Estimator 

fi(x) « ^ Z K(rj^) (5) 

^ nhP(n) j=l 

where h(n) = n"^/^ and K(w) = EXP(-Iw^ + . . . + wp/2)/ {2ir)^^^ 
Lof tsgaarden'^Quesenberry Density Estimator . 

k -1 

f (x) = ^ (6) 

1 nA ^ ' 

1/2 2rP7rP/^ 
where k = n ' and ^ = p and r is the distance from the new 

n r^ z lp/2) 

observation to the k^th closest x. as determined by Euclidean distance « 

n 1 •* 

Gessaman Density Estimator 

^ ,n 

where k^ = [n ^P~^^/ ^P"*'^^ ] and D is the area of the bounded block 
into which the observations falls. When the observation falls into 
an unbounded block, the Nearest Neighbor procedure is used to 
classify the observation. 

Procedures 

Each of the 72 combinations of sample size, dimensionality,, 
mecui vector, and distribution were used to form classification pro- 
cedures for the six discrimination rules. Then 500 new observations 
from each of the populations were generated and classified according 
to the rules established. Ten iterations of the process were per- 
formed when n - 64 or n = 200 and five iterations were performed 
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when n = 729.^ The proportion of misclassif ied observations from 

each population and the total proportion of misclassif ied observa- 

3 

tions were determined and compared. All computer programs to 
generate the data and classification procedures were written in 
the FORTRAN IV programming language. 

Once the proportion of misclassif ication was obtained, there 
were certain hypotheses tested. For two multivariate normal 
distributions with identical variance-covariance matrices, the 
optimal classification procedure is such that the respective 
probabilities of misclassif ication are equal (Anderson, 1951) . 
Because of this, the first hypothesis concerned the empirical 
probabilites of misclassif ication from each population for each 
of the six procedures (H^: P[l/2] = P[2/l]). 

The second hypothesis concerned the overall probability of 
misclassif ication for the procedures- For each of the parameter 
combinations^ the hypothesis of equality of the six proportions 
was tested. When the hypothesis was rejected, the Marscuilo (1966) 
analogue to the Schef fe multiple comparison theorem was used to 
determine significant pairwise contrasts. 

A similar testing procedure was used to determine if signi- 
ficant differences existed between the overall proportions of mis- 
classification for the three, sample sizes. 

Results 

Log Normal Distribution (p = 2 ) 

The results for the two dimensional Log Nomal distributed 



Because of the computer time necessary for the iterations 
when n = 729, and because of a computer cost factor # it was not 
feasible to perform more than five iterations. 



^P(I/J) is the proportion of observation from Sj misclassi- 

• ^ i_ _ 



fied into -S J, 



random variables are presented in Table 3. The Log Normal distri- 
bution is an example of a semi-infinite range distribution,* 
y = 1 > The performance of the LDF and QDF was signif iccintly inferior 
to the four nonparametric procedures. However, there was no signif- 
icant difference between the overall proportions of misclassif ication 
for the four nonparametric procedures except when n » 64. When 
n = 64, the overall proportion of misclassif ication for the Nearest 
Neighbor and Gessaman techniques was significantly smaller than for 
the Parzen-Cacoullos and Lof tsgaarden-Quesenberry procedures. For 
all procedures there was no significant difference between the over- 
all proportions of misclassif ication when n = 200 or n = 729. However 
there were differences between n = 64 and the other two sample sizes. 
Therefore, a sample size of 200 was necessary for the criterion sample 
y - 2. The pattern of results when y = 2 was consistent with the 
results when y = 1. The performance of the LDF and the QDF were both 
significantly worse than the four nonparametric procedures. However, 
there was no differences between the misclassif ication proportions for 
the four nonparametric procedures. A minimally sufficient criterion ^ 
sample size was n - 200. 

y - 3 . When n = 200 or n = 729, the LDF's performance was signi- 
ficantly worse than the other five procedures; however, there was no 
difference between the proportions of misclassif Ication for the 
other five procedures. When n = 64, the only procedures that were 

z 

In the tables the following abbreviations were used: LDF — 
Linear Discriminant Function; QDF — Quadratic Discriminant Function; 
NN — Nearest Neighbor with Probability Blocks: L-Q — Loftsgaarden- 
Quesenberry Density Estimator; P-C — Par zen-CacoulXos Density Esti- 
mator;GESS — Gessaman Density Estimator. 
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equivalent v/ere the Nearest Neighbor and Gessaman and the Parzen- 
Cacoullos and Lof tsgaarden-Quesenberry density estimators. Again , 
a sample of size n = 200 was sufficiently large. 
Logit Normal Distribution (p_=» 2) 

The results for the two dimensional Logit Normal distribution 
are presented in Table 4. The Logit Normal distribution is a member 
of the family of finite range distributions. 

U_f_lj_ For the two larger sample sizes, there was no significant 
difference between any of the six procedures. However, only for the 
Gessaman procedure when n = 200 and the Nearest Neighbor procedure 
when n= 729 was the hypothesis of equality of the respective propor- 
tions of misclassification (P[l/2] and P[2/l]) not rejected. There- 
fore, for those situations, the Gessaman and Nearest Neighfcnr ::>ro- 
cedures were the most desirable. When n - 64, only the LDF and the 
Loftsgaarden-Quesenberry procedure were significantly different from 
each other. A sample size of 200 was sufficiently large for the 
criterion sample. 

- 2 . The results when y = 2 resembled those when y = 1. There 
was no difference among the six overall proportions of misclassifi- 
cation when n = 64 or n = 729. When n = 200, there was a significant 
difference; however, there were no significant pairwise contrasts. 
The Nearest Neighbor and the Gessaman procedures were the most 
desirous to use because the hypothesis of equality of P(l/2) and 
P(2/l) was not rejected when n = 200 and n = 729 for those procedures. 
Again, a sample size of 200 was sufficient. 

y = 3. The results were markedly different when y.= 3. When n = 64 
and n = 200 the Loftsgaarden-Quesenberry procedure and the QDF were 
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the two best performing procedures; when n = 729 all but the LDP 
and Parzeri-Cacoullos procedures were desirable.] For most of the 
procedures a sample of size 200 was sufficient; in some instances 
a sample of size 64 was large enough. 
Inverse Hyperbolic Sine Normal Distribution (p = 2) 

The results for the Monte Carlo simulation for the Inverse 
Hyperbolic Sine Normal distribution when p = 2 are presented in Table 
5. The Inverse Hyperbolic Sine Nomal distribution is a member of 
the infinite range family of distributions. 

y = 1. Except for the QDF^ the remaining five procedures were equally 
effective in classifying the observations for all the samples. Except 
for the Parzen-Cacoullos density estimator and the Loftisgaarden- 
Quesenberry density estimator, a sample of size 64 was sufficiently 
large. For the two density estimators / a sample of size n = 200 
was necessary. 

y = 2> The four nonparametric procedures uniformly misclassif ied 
fewer observations than the LDF cuid QDF for all sample sizes. Since 
the hypothesis of equality of P(2/l) and P(l/2) was not rejected 
for the Gessaman procedure (n = 64 or 200) and the Nearest Neighbor 
procedure (n = 64), in those instances, those procedures were the 
best procedures to use. Except for the LDF a sample of size n .= 200 
was sufficient. 

y - 3. For the two larger sample sizes, the four nonparametric 
procedures were significantly better than the parametric ones, but 
not signif iccmtly different from each other. The Nearest Neighbor" 
and Gessaman procedures were the best to use in this situation because 
when n = 200 and n = 729, the hypothesis of equality of P(2/l) and 
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P(l/2) was not rejected. The necessary sample size varied with the 

specific procedure. 

Normal Distribution (p = 2) 

The results for the two dimensional normally distributed random 
variables are presented in Table 6. Because all of the assumptions 
of the LDF are satisfied, it is expected that the LDF would be the 
optimal procedure in this situation. Hence, the use of the normally 
distributed random variables serves as a check on the procedures. 
^ " to be expected, the performance of the LDF approached the 

optimal probability of misclassif ication. Additionally, as the 
sample size increased, the performance of the QDF approached the LDF 
since the variance-covariance matrix approached the identity matrix. 
There was no difference between the performance of any of the proce- 
dures, and a sample of size 64 was sufficiently large to develop an 
efficient classification rule. 

^ = 2' Similar to the results when y = 1, there were no significant 
differences among the six procedures. Again a sample size of n = 64 
was sufficient. 

U = ^- The performance of the procedures was equivalent except for 
the proportions of the Nearest Neighbor and Gessaman procedures 
which were significantly worse than the other procedures. Except 
for the Nearest Neighbor and Gessaman procedures, a sample of size 
64 was sufficient; for those two procedures a sample size of 729 
was necessary. 

Log Normal Distribution (p = 3) 

The results for the three dimensional Log Normal random 
variables are presented in Table 7. 
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M = 1 * Except for the QDF^ the remaining five procedures were 
equivalent in terms of their overall proportion of misclassif ied 
observations. When n = 64, the hypothesis of equality of P{2/1) 
and P(l/2) v/as not rejected for the Nearest Neighbor or Gessaman 
procedures (it was rejected for the other ^procedures) . Hov7ever, 
because of the significance between both the Nearest Neighbor and 
Gessaman procedures based on differing sample sizes, it appeared 
that even with as large a sample size as n = 729, there was insta- 
bility in the tv/o procedures. For the LD?, QDF, and Parzen- 
Cacoullos procedures a sample of size 64 was sufficient; for the 
Loftsgaarden-Quesenberry procedure a sample of size 200 was 
necessary. 

U - 2. The performance of the four nonparametric procedures was 
significantly better than the two parametric procedures. The hypoth 
esis of equality of P(l/2) and P(]/l) was not rejected for n = 200 
or n = 729 for the Nearest Neighbor and Gessaman procedures. For 
those two procedures a sample of at least 729 was necessary. For 
the remaining procedures a sample. of size 200 was sufficient; for 
the Loftsgaarden-Quesenberry procedure a sample of size 64 v/as 
sufficient. 

M = 3. The Loftsgaarden-Quesenberry procedure was uniformly the 
best procedure to use while the LDF was uniformly the worst. A 
sample of n = 64 was sufficient for the Loftsgaarden-Quesenberry 
procedure. For the remaining procedures a sample of size 200 was 
sufficient; for the Nearest Neighbor and Gessaman procedures, a 
sample of size 729 was necessary. 
Lbgit Normal Distribution (p = 3) * 

The three dimensional Logit Normal results are presented in 
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Table 8. ^ 

y = 1* Unlike the log normal distribution's results, all of the 
proportions of misclassif ication in this situation were equivalent. 
Because for the two larger samples the hypothesis of equality of 
P(2/l) and P(l/2) was not rejected for the Gessaman procedure, that 
procedure was the optimal procedure to use. When n = 729, the 
hypothesis was also not rejected for the Lof tsgaarden-Quesenberry 
procedure; therefore, in that situation, the Loftsgaarden-Quesen- 
berry procedure would be desirous to use. 

y = 2. When n = 64 or n = 200 any of the procedures except for the 
Nearest Neighbor and Gessaman procedures would be appropriate to use 
When n = 729, the Nearest Neighbor and Gessaman procedures were the 
best to use. For all of the procedures except for the Nearest 
Neighbor and Gessaman procedures a sample of size 64 was sufficient; 
for those two procedures a sample of size 729 was necessary. 
ti = 3. When n = 64, the QDF was the best procedure to use. However 
since the proportion of misclassif ication when n = 64 was signifi- 
cantly worse than when ri = 729, a sample of size ri = 64^^^w^^ 
large enough. Similarly, when n = 200 either the QDF or Lofts- 
gaarden-Quesenberry procedures were the best of the six. However, 
since the procedure when n = 200 was different than the procedure 
when n = 729 for the Lof tsgaarden-Quesenberry procedure, a sample 
of size 200 was not sufficiently large. 
Inverse Hyperbolic Sine Normal Distribution (p = 3) 

The results of the three dimensional Inverse Hyperbolic Sine 
iiormal distribution are presented in Table 9. 
U =1. When n = 64 or n = 200 any of the procedures except for the . 
QDF and Parzen-Cacoullos procedures was appropriate for use. When:/; 
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n = 729 any of the procedures except for the QDF was adequate. For 
the two larger sample sizes , the Nearest Neighbor and Gessaman 
procedures were the best procedures to use because the hypothesis 
of equality of the respective proportions of misclassif ication was 
not rejected. For the four nonparametric procedures a sample of 
size 200 was necessary; for the QDF and LDF a sample of size 64 was 
sufficient. 

y = 2. When n = 64 the Loftsgaarden-Quesenberiy procedure was the 
optimum to use; when n = 200 either the Lof tsgaarden-Quesenberry 
or the Parzen-Cacoullos; when n = 729 the Nearest Neighbor and the 
Gessaman procedure. The parametric procedures were signif iceuitly 
worse than the nonparametric ones. 

y = 3. The Loftsgaarden-Quesenberry and Parzen-Cacoullos procedures 
were the best procedures to use in this situation. For this situa- 
tion, a sample of at least 729 was necessary. 
Normal Distribution (p = 3) 

The results of the three dimensional normally distributed 

. » ^ 

rcUidom variaibles appear in Table 10. 

y = 1. Because the assiimptions of the LDF were satisfied, it was 
the best procedure to use in this situation. However, similar to the 
case when p = 2, any of the procedures could be effectively used. 
A sample of size n = 64 was sufficient. 

y = 2. Again, because the assumptions of the LDP were satisfied,, 
the LDF was the optimal procedure to use. When n = 64 or n « 200^ 
cuiy of the procedures except for the Nearest Neighbor and Gessaman \ 
procedures would be effectively substituted for the LDF. When 
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y - 3> There was no difference betv:een the performance of any of 
the six procedures. A sample of size n = 64 was sufficient to obtain 
consistent results v/ith the optirr.al proportion of misclassif ication. 

Conclusions 

This study has shown that when observations are drawn from non- 
normal distributions^ certain nonparametric discrimination proce- 
dures more appropriately classify the observations than does the 
parametric LDF Cor QDF) . Even when the data to be classified is 
from multivariate normal distributions v/ith equal variance-co \ariance 
matrices^ the performance of certain of the nonparametric procedures 
parallels that of the parametric procedures. Therefore , usage of 
the nonparametric procedures is ma:idated regardless of the distri- 
bution functions describing the data. 

The following are comprehensive summaries and conclusions 
drawn from the results of the study concerning the six types of 
classification procedures under consideration. 
Linear Discriminant Function 

Because of the theoretical development of the Linear Discri- 
minant Function y the performance of the LDF was most superior for 
the data from the multivariate normal distributions with equal 
variance-covariance matrices. For multivariate normally distributed 
random obser vations ^ a sample size of 64.was sufficient to effect 
an appropriate classification rule for the LDF. 

For non-normal distributed random data^ the use of the LDF was 
not appropriate. The LDF's overall proportion of misclassif ication 
was largest of the six types of classification procedures for the 



the Logit Normal distribution, the parformance of the LDF was 
comparable to the nonpar ametric procedures. Additionally, for all 
of the three non-normal distributions/ there was an extreme inflation- 
deflation effect concerning the respective proportions of misclassi- 
f ' oation for each population; one of the proportions was much larger 
than the optimal level, and one was much lower* 

For data whose distribution is unknown, it would be unwise to 
make use of the Linear Discriminant Function to classify the observa- : 
tions. 

Quadratic Discriminant Function 

The Quadratic Discriminant Function is the cinalogue to the LDF 
when the variance-covariance matrices are unequal. For the results 
of the study based on the multivariate normally distributed random 
variables, the observations were dra\m from multivariate normal ^ 
distributions with equal variance-covariance matrices. Since the 
QDF, given equal variance-covariance matrices becomes the LDF, the 
performance of the QDF paralleled that of the LDF — especially as 
the sample size increased for the criterion sample (since the 
estimated parameters more closely resemble the populaton parameters).; 

Similar to the LDF, it would be unwise to use the QDF to classify t] 
observations from unknown distributions. 
Nearest Neighbor Procedure with Probability Squares 

When the dimensionality was two, the Nearest Neighbor procedux&i^lM 
performed well for all of the distributions. There was no disceimabte:^; 
difference between the performance of the Nearest Neighbor procedure' 
for any of the distributions. : y^^'-\-'-'-'CM:^!}\ 

When p - 3, the performance of the Neaxest Neighbor procedure 



declined significantly. This decline was not due to the mode of 
classification of the procedure^ nor to an inherent fault in the 
procedure, but instead due to the development of the probability 
blocks and the size of the criterion sample. 

The number of probability blocks is a function of the sample 
size; the specific function established so that there is a suffi- 
cient range of observations in each block. When p = 2, the block 
development function was sufficient (there are enough observations 

in each block so that the range of observations that would belong 

to a particular block was widespread). Hence, the Nearest Neighbor 
procedure when p = 2 was well developed. However, when p = 3, 
because of the desire to maintain the same three criterion sample 
sizes that were used when p = 2, there was no block development 
function that would effect as appropriate a set of probability 
blocks as when p = 2. For this reason, the Nearest Neighbor pro- 
cedure when p = 3 was less effective than when p = 2. 

When p = 2, in most cases there was a significant difference 
between the mean proportion when n = 64 and that when n = 200 or 
n = 729. For this reason, the use of the procedure for small samples 
may not be desirable. However, because of the desirable property 
that the Nearest Neighbor procedure is completely distribution-free, 
because of the relative ease with v/hich it Ccui be developed, and 
because of its generally good performance, the use of the Nearest 
Neighbor procedure is appropriate and suggested for any type of 
classification problem for which the underlying distribution is un- 
known. 

P ar z en -Caco u 1 lo s Density Estimator 
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approximates that of the LDP and QDF since the Parzen-Cacoullos 
estimator is asymptotically a multivariate noonnal density (Parzen^ 
1962). However^ because of its nonparametric features, the per- 
formance of ti*e Parzen-Cacoullos procedure was somev/hat better than 
the LDF and QDF. 

The performance of the Parzen-Cacoullos density estimator was 
equivalent for each of the three non-normal distributions, and as 
to be expected^ its performance was best for the multivariate 
normally distributed random variables. For the non-normal distri- 
butions, the performance of the Parzen-Cacoullos density estimator 
procedure was not significantly different from the other non- 
parametric procedures. 

Lof tsgaarden-Quesenberry Density Estimator 

In addition to the Nearest Neighbor procedure, the Lof ts- 
gaarden-Quesenberry procedure was that procedure of the six which 
most effectively classified the observations. There was no differ- 
ence between the discriminatory power of the Lof tsgaarden-Quesen- 
berry procedure for any of the four distributions. 

For the Nearest Neighbor procedure, when n =64, the mean 
proportion of misclassif ied observations v/ais in general significantly 
different than when n == 200 or h = 729, suggesting that a criterion 
sample size of 64 was not sufficient for the Nearest Neighbor proce- 
dure. However, the respective mean proportions of misclassif ication 
for the Lof tsgaarden-Quesen terry procedure when n = 64 was not / 
different from the mean proportion when n = 200 or n 729. 

When the criterion sample is small, the use. qf the Lof tsgaardexi-r^ 
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to obtain a criterion, either the Nearest Neighbor or the lofts- 
gaarden-Quesenberry procedure would suffice. 
Gessaman Density Estimator 

The Gessaman procedure presupposes the existence of probability 
blocks. Therefore, because the Nearest Neighbor with probability 
blocks procedure has been shown to be such an effective classifica- 
tion procedure regardless of the type of distribution from which the 
observations come, it would appear that the use of the Gessaman 
procedure is unnecessary. 

For the instances in which the distributions under consider- 

i 

ation were widely separated, the Gessaman procedure became almost 
identical to the Nearest Neighbor procedure; for the cases in which 
the probabilities of misclassif ication for the Gessaman procedure 
are different from that for the Nearest Neighbor procedure, at no 
time is the mean proportion of misclassif ication for the Gessaman 
procedure significantly less than the mean proportion of misclassi- 
f ication for the Nearest Neighbor procedure. 

Summary - 

In general, based on the results of this study, the Nearest 
Neighbor and Lof tsgaarden-Quesenberry classification procedures 
were the two types of procedures which uniformly best classified 
obsezrvations from unknown distributions. These two discrim-? nation 
techniques should be considered as viable alternatives to the para- 
metric Linear Discriminant Function, especially when the distributions 
of the observations are unknown. 
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